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In a spiked population model, the population covariance matrix has all its eigenvalues 
equal to units except for a few fixed eigenvalues (spikes). Determining the number of 
spikes is a fundamental problem which appears in many scientific fields, including signal 
processing (linear mixture model) or economics (factor model). Several recent papers 
studied the asymptotic behavior of the eigenvalues of the sample covariance matrix 
(sample eigenvalues) when the dimension of the observations and the sample size both 
grow to infinity so that their ratio converges to a positive constant. Using these results, 
we propose a new estimator based on the difference between two consecutive sample 
eigenvalues. 
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1. Introduction 

In a spiked population model, the population covariance matrix has all its eigenva- 
lues equal to units except for a few hxed eigenvalues (spikes). This model appears in 
many scientific fields often with different names. In economics, it is called "factors 
model" within the Ross Arbitrage Pricing Theory (APT) and the aim is to relate 
observed data (assets) to a small dimensional set of unobserved variables which 
are then estimated [T] . In physics of mixture, "linear mixture model" are naturally 
considered for various phenomena [2]. In wireless communication, a signal emitted 
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by a source is modulated and received by an array of antennas which will permit 
the reconstruction of the original signal. 

An important question to be addressed under this model is how many factors/ 
components/signals there are. It is generally a first step preliminary to any further 
study such as estimation and forecasting. 

Many methods for determining the number of factors have been developed, 
based on the minimum description length (MDL), Bayesian model selection or 
Bayesian Information Criteria (BIC) (See [3]). Nevertheless, these methods are 
based on asymptotic expansions for large sample size and may not perform well 
when the dimension of the data p is large compared to the sample size n. To avoid 
this problem of high dimension, several methods have been recently proposed using 
the random matrix theory, such as Harding [4] or Onatski [5] in economics, and 
Kritchman & Nadler [5] in array processing or chemometrics literature. 

In this paper, we present a new estimator for the number of spikes from high- 
dimensional data. Our approach is based on the results of Bai & Yao [7] and Paul 
[5] which give the limiting distributions of the extreme eigenvalues of a sample 
covariance matrix coming from a spiked population model, and a recent result of 
Benaych-Georges, Guionnet & Maida [5]. The obtained results are presented in 
Section 3. 

The remaining sections of the paper are organized as follows. In Section 2, we 
introduce the spiked population model, and recall known results on the almost sure 
limits of extreme eigenvalues which lead to the idea of our estimator. In Section 3 we 
define precisely our estimator and prove its consistency in the case of simple spikes 
with known variance. Next we give a method of estimation in the case of simple 
spikes with unknown variance. In Section 4, we define the factor/linear mixture 
model that we link to the spiked population model and we compare our method to 
those of Harding [4] and Kritchman & Nadler [6]. We consider the case of spikes 
with greater multiplicity in Section 5. Finally, we discuss the extension to the gen- 
eralized spiked population model. Throughout the paper, simulation experiments 
are conducted to access the quality of the proposed estimation. 

2. Spiked Population Model 

We consider x = EV^y, where y g l p is a zero-mean random vector of i.i.d. 
components, E is an orthogonal matrix and 



where £ has K non null and non unit eigenvalues (ak)i<k<K with respective multi- 
plicity (nk)i<k<K {n\ + - ■ - + riK = qa)- Therefore, the eigenvalues of the population 
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covariance matrix V are unit except the otj, called spike eigenvalues. Notice that, 
if the observations are Gaussian, we may assume that V is diagonal by using a 
suitable orthogonal transformation. 

Let (xi)(Kj<„) be n independent copies of x. The sample covariance matrix is 



1 " 



n 

i=l 

It is assumed in the sequel that q is fixed, and p and n are related so that when 
n — > +oo, ^ — ¥ c > 0. Moreover, we assumed that eci > • • • > > 1 + ^fc for all 
i € {1, . . . , if}. For a ^ 1, we define the function 

ca 



0(a) = a 



a- 1 

Let A„,i > A„,2 > • • • > A„ iP be the eigenvalues of the sample covariance matrix 
S„. Let Si = n\ + ■ ■ ■ + for 1 < i < K . Baik and Silverstein [TU] proved that, 
under a moment condition on x, for each k € {1, . . . , i^} and s&_i < j < Sfe almost 
surely, 

K,j — > a 2 (/)(a k ). 

In other words, with the hypotheses that a k > 1 + \fc for all fc, and has 
multiplicity n k , then is the limit of n k packed sample eigenvalue {A n) j, 

Sfe-i + 1 < J < 5^}. They also prove that for all 1 < i < L with a prefixed 
range L almost surely, 

\„, qo+l ^b = a 2 (l + y/Z) 2 . 

Our aim is to estimate qo when only S„ is known. The idea is to use, as suggested 
in Onatski [5], differences between consecutives eigenvalues 

Sn.j = A n J — A n j + i . 

Indeed, applying the results quoted above it is easy to see that a.s. if j > qo, 
S n ,j while when j < qo, S n ,j tends to a positive limit if the a k are different. 
Thus it is possible to detect go from index-numbers j where S n j becomes small. 

3. Case of Simple Spikes with Known Variance cr 2 

In this section, we suppose that a is known and that all the spikes are simple, i.e 
ni = ■ ■ ■ = rix = 1 • Under these hypotheses the population eigenvalues are 

spec(U) = a 2 { ai, ■ ■ ■ ,a qo; 1, • • • ,1). 

go p-qo 

We also need the following assumption: 

Assumption 3.1. The entries y 1 of the random vector y have a symmetric law and 
a sub-exponential decay, that is there exists positive constants C, C such that, for 
all t > C, 

P(|y l | > t c ) < e~ l . 
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Especially, the Gaussian vectors satisfy this hypothesis. 

As stated previously the main observation is that when one follows the sample 
eigenvalues in a descending order, the successive spacings S n j shrink to small values 
when approaching non-spiked values. Therefore, our estimation method will use a 
carefully determined threshold d n . We propose to estimate qo by the following 

q„ = max{j € {1, . . . , s} : Vfc £ {1, . . .,j}, S ntj > d n and S nJ+1 < d n }, 

where s > qo is a fixed number big enough, and d n is a level to determine. In 
practice, the integer s should be thought as a preliminary bound on the number of 
possible spikes. 

3.1. Consistency 

Theorem 3.1. Let (x,) (i<j< n ) be n copies i.i.d. ofx= EV^y, where y € K p is a 
zero-mean random vector of i.i.d. components which satisfies Assumvtions [KT\ and 
E is an orthogonal matrix. Assume that 

where E has qo non null, non unit and different eigenvalues u\ > ■ ■ ■ > a qa > 1+^/c. 
Assume that ^ — > c > when n —¥ +oo. 

Let (d n ) n >o be a real sequence such that d n — > and n 2 / 3 d n — > +oo. Then the 
estimator q n is strongly consistent, i.e q n —¥ qo almost surely when n — > +oo . 

In the sequel, we will assume that a 2 — 1 (If it is not the case, we consider 
^r-). For the proof, we need two theorems. The first, Proposition 13. 11 shows that 
the limiting law of A nj — 4>(a.j) is Gaussian (Bai and Yao [7] and Paul [8]): 

Proposition 3.1. Assume that the entries x l of x satisfy E(||x l || 4 ) < +oo, 
o.j > 1 + yfc for all 1 < j < qo and have multiplicity 1. Then as p. n — > +oo so 
that — — > c, 

V^(A nJ -<t>{OLj)) A#,(7 2 ( Q3 )) 

where a 2 {a 3 ) = 2a 2 (l - j^w) ■ 

The second Proposition 13 . 21 is issued from the Proposition 5.8 of [S]: 

Proposition 3.2. Under the Assumvtions HOI for all 1 < i < L with a prefixed 
range L, 

2 

— (A„, go+l -b) = Op(l), 
where [3 = (1 + y/c)(l + VcF 1 )? . 
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We also need the following lemma: 

Lemma 3.1. Let (X n ) n >o be a tight sequence of random variables. Then for all 
real sequence (u n )n>o which diverges to infinity, 

n\x n \ > Un ) -> o. 

Proof. As (X n )n>o is a tight sequence, for all e > 0, it exists a compact K such 
that, for all n £ N, P(X ra ^ K) < e. Furthermore, as u n — > +oo, it exists n 6 N 
such that for all n > N, [-u n ,u„] D K. So P(|X„| > u n ) < P(X„ £ K) < e. 
Consequently, P(|X„| > u n ) -> 0. □ 

Proof, of Theorem 13. II We have 

{<?« = <?o} = {<7o = max{j : Sj > d„}} 

= {Vj G {1, . . . , (Jo}, <5 n ,j > ^n} n {8„ : qo + l < d n }- 

Therefore 

P(<7n = <Zo) = P f| {5 n ,j > d n } n {5 ntqo+ i < d n } 
\1<3<<?0 

= i - p y {5 nJ < d„> u {5 n , qo+ i > d n } 

\l<i<90 

> < - p (<Wo+i > *»)• 

3=1 

Case of j = qo + 1. In this case, 5 n>qa +i — A n>go +i — X n .q a +2 (non-spike eigenvalues). 
We consider the following sequence of random variables 



Y n — —^-{\ n ,q +i — b). 



T 

By Proposition 13.21 (Y n ) n >i is a tight sequence. So by using Lemma 13. 1[ for any 
sequence (a„)„>o, a n — > +oo we have 

P{\Y n \ > a n )^0. 

Therefore 



/ n 3 

< «n) = P I — (|/am6da„ !go+i - 6| < a r , 



|^n, 90 +i — ^1 — ~P 
1. 
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We choose d n — > such that n 2 ^ 3 d n — >• +00. So we have 
with 

in = [b ± dn] . 

It follows 

e J„}) -> l. 

Therefore 

P(<5n,?o + l > dr.) -> 

Case o/ 1 < j < go- These indices correspond to the spike eigenvalues. By using 
Proposition 13. II and the previous argument, it is easy to show that we can choose 
a real sequence (6 n ) n >oj b n — > such that y/nb n — > +00 and 

P(A„, 3 e l„,i) -> 1, 

where 

l„j = [</>{otj) ± &„] . 

Therefore 

• For all 1 < j < qo, we have 

> 0(aj-) - 4>(a J+1 ) - b n ) > P({A nj - € l„j} n {A„ J+a e -> 1. 

Let 

C„,j = 4>{&j) - _ b n- 

• For j = goj <5ri.g — ^n.go — Ki,q +i- By using the first section of the proof, 
one can show that 

TP(Sn, go > (t>{a qo )-b-{b n + d n )) >P({A„, 9o € \ n , qo ] n {A n ,, 0+ i € J„}) -> 1. 
Let 

Cn,g = '/'("go) _ b ~ ( b n + dn) . 

• Therefore for all < j < qo we have 

H5 nJ > c ntj ) -> 1 P(5„, j < c nj ) -> 0. 
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As d n -> and for all 1 < j < q , c nj -> Cj > 0, it exists N € N : Vn > iV, 
So we have 

Conclusion. F(S n ^ qo+ i > d n ) — > and 2j=i P(^n,j < ^n) — > 0, therefore 

P(<?n=<Zo) — > 1. □ 

n— >-+oo 

3.2. Simulation experiments 

Now we will illustrate the previous result by some simulations. First, we have to 
chose the sequence d n to be used. Theoretically speaking, all the sequences satisfying 
the requirement d n — > such that n 2 / 3 d„ — > +oo are convenient. We tested several 
sequences and we decided to take one of the form which a sequence (a„)„>o 

proportional to \J2 log logn : this idea came from that, as in the case of the mean of 
i.i.d random variables, the X n j corresponding to the non-spikes tend to a gaussian 
law. So we can conjecture a result analog to the law of the iterated logarithm for 
the A raj , j > go- Finally, we choose a n — Ay/2 log log n and simulate two different 
models: one with dispersed spikes which should lead to an easier estimation of qoj 
and a more difficult case with closer spikes: 

• Model 1: q = 5, (a u a 2 , a 3 , a 4 , a 5 ) = (259.72,17.97,11.04,7.88,4.82); 

• Model 2: q = 4, (ai, a 2 , a 3 , a 4) = (7,6,5,4). 

Note that the values of Model 1 have been chosen to be the same as in [4] . For each 
model, two different values of c, 0.3 and 0.6, are considered. We give in Tables 1-2 
and 3, respectively, the distribution of q ni its mean and mean squared error over 
1000 independent replications. The frequency of q n = qo is given in Figure 1. 



Table 1. Mean, mean squared error and empirical distribution of q n over 1000 independent 
replications for Model 1. 





Distribution of q n 


ip>n) 


Mean 


MSE 


1 


2 


3 


4 


5 


6 


7 


(30,100) 


5.057 


0.212 


0.001 


0.007 


0.009 


0.0 


0.883 


0.1 


0.002 


(60,200) 


5.081 


0.107 


0.001 


0.001 


0.0 


0.0 


0.91 


0.088 


0.0 


(120,400) 


5.079 


0.073 


0.0 


0.0 


0.0 


0.0 


0.921 


0.079 


0.0 


(240,800) 


5.069 


0.064 


0.0 


0.0 


0.0 


0.0 


0.931 


0.069 


0.0 
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Table 2. (Continued) Mean, mean squared error and empirical distribution of q n over 1000 
independent replications for Model 1. 





Distribution of q n 


(p, ™) 


Mean 


MSE 


1 


2 


3 


4 


5 


6 


7 


(60,100) 


5.056 


0.139 


0.001 


0.004 


0.003 


0.002 


0.914 


0.076 


0.0 


(120,200) 


5.08 


0.098 


0.0 


0.001 


0.002 


0.0 


0.91 


0.087 


0.0 


(240,400) 


5.072 


0.079 


0.002 


0.0 


0.0 


0.0 


0.924 


0.075 


0.0 


(480,800) 


5.072 


0.069 


0.0 


0.0 


0.0 


0.0 


0.929 


0.07 


0.001 



Table 3. Mean, mean squared error and empirical distribution of q n over 1000 
independent replications for Model 2. 





Distribution of q n 


(p,n) 


Mean 


MSE 





1 


2 


3 


4 


5 


(30,100) 


3.718 


1.086 


0.0 


0.001 


0.059 


0.0 


0.778 


0.085 


(60,200) 


3.925 


0.582 


0.013 


0.024 


0.019 


0.0 


0.857 


0.087 


(120,400) 


4.005 


0.331 


0.01 


0.01 


0.001 


0.0 


0.902 


0.077 


(240,800) 


4.062 


0.110 


0.002 


0.001 


0.0 


0.0 


0.924 


0.073 


(60,100) 


3.478 


1.655 


0.053 


0.086 


0.059 


0.001 


0.734 


0.067 


(120,200) 


3.818 


0.823 


0.025 


0.033 


0.024 


0.0 


0.853 


0.065 


(240,400) 


3.969 


0.394 


0.009 


0.015 


0.011 


0.0 


0.893 


0.072 


(480,800) 


4.051 


0.108 


0.003 


0.0 


0.0 


0.0 


0.934 


0.063 




Fig. 1. Frequency of q n = qo over 1000 independent replications. 



In both cases, we can observe the asymptotic consistency of the estimator. Com- 
paring the two models, except the last case (p, n) = (480,800), the estimator per- 
forms better in Model 1 than in Model 2. This phenomenon is due to the fact that 
the differences between consecutive eigenvalues are smaller than in Model 2 so that 
it is more difficult to distinguish spikes from non spikes. 
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Within a same model, the convergence is slower in the c = 0.6 case. We could ex- 
plain this by the fact that the gap between two consecutive spike eigenvalues stays 
the same, and when c increases, the spectrum of S n is more dispersed, so that the 
differences S n j from non- spikes are larger and again our detection problem is more 
difficult. 



It is worth mentioning that the chosen constant d n — „°/3° s " /^ leads to a slight 
over-estimation of go for the tested sizes (p,n). This finite-sample behaviour could 
be improved with a more sophisticated choice of d n which however seems a difficult 
point to address. 



4. Case of Simple Spikes with Unknown Variance 

In practice, the scale parameter a 2 is also unknown and we need to estimate it as 
well. First, we will explain how to do in the non-spikes (null) case, i.e. V = cr 2 I p , 
and then in the case with spikes. 

4.1. Estimation of the variance in the white case 

We consider a zero-mean random vector x £ W with population covariance matrix 

V = cov(x) = a 2 I p . 

We keep the previous assumptions. We will use the law of large numbers to es- 
timate the unknown variance a 2 . We have the following theorem (Marcenko and 
Pastur [IT], Bai and Silverstein |12j ) 

Proposition 4.1. Assume that, for any r\ > : 

^^Xjk] 2 !^ k ^ v ^) -> when n +oo. 



rfnpj^ 



Then, with probability one, the empirical spectral distribution (ESD) F Sn of S n 
weakly converges to the Marcenko- Pastur distribution with ratio index c and scale 
parameter a 2 , denoted by F c ' a (x), which has a density function 



v 2(x) -[2d^V(b + -x)(x-b-) l fb-<x<b+ 
[ U otherwise 

where b~ = a 2 (I - y/c) 2 and b+ = <r 2 (I + </c) 2 . 

Note that a 2 represents the mean of the limiting distribution. Moreover, it is 
well-known that under the condition of the Proposition [4J] it holds almost surely, 

p 



I , ^ 



y i=l 
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4.2. Determining the number of spikes with an unknown variance. 

As we notice in the first section, when the variance is known and different of one, we 
oniy have to divide the consecutive difference Si. n by this variance. As the variance 
is unknown, we wiil replace it by the estimate a 2 = | Y^=i ^n,»> which converges 
almost surely to a 2 when p — > +00. Nevertheless, because of the spikes, the variance 
of g 2 will be greater than the one in the null case. The variance will be minimum 
if we only take the mean of the non-spike eigenvalues i.e those that have an index 
i > <7o + 1- The problem is that we don't know go- By consequence, the idea is to 
make a first estimation g° of go with a 2 — ^ X)f=i ^n,»- Then, if g" > 0, we set 
°i = F^F Si=gO+i Vi (So we have a 2 > a 2 ), and we reestimate g by q\ using 
this new estimation. We repeat it until we find an indice k such that q\ = g„ +1 . 
If such an indice doesn't exist, the algorithm will stop at the preliminary bound 
k = s fixed initially. To sum up, here is the algorithm: 

ql=0 

sigma2=l/p* (lambda_l+. . . +lambda_p) 

q2="estimator of the known variance case with division by sigma2" 

while q2~=ql do 
ql:=q2 

sigma2=l/(p-ql)*(lambda_(ql+l)+. . .+lambda_p 

q2="estimator of the known variance case with division by sigma2" 
end 

result= (ql , sigma2) 

4.3. Simulation experiments 

We conduct the simulations with two values of the variance a 2 — 1, and a 2 = 500 
to see if a high variance will influence the estimation. We keep the same other 
parameters as in the previous simulation study of Section 3 and estimate a 2 and 
the number of spikes with the method explained above. Additional to the statistics 
about the spikes number estimator q n , we provide also those about the final estimate 
a 2 of the unknown variance. The results are displayed in Tables 4 to 8. 



Table 4. Mean, mean squared error and empirical distribution of q n , mean and mean squared error of <r 
over 1000 independent replications for Model 1 and a 2 = 1. 





Distribution of q n 


a' 1 


(p,n) 


Mean 


MSE 


1 


2 


3 


4 


5 


6 


7 


Mean 


MSE 


(30,100) 


5.052 


0.338 


0.003 


0.015 


0.008 


0.0 


0.849 


0.0 


0.125 


0.955 


0.015 


(60,200) 


5.108 


0.112 


0.0 


0.001 


0.0 


0.0 


0.89 


0.107 


0.002 


0.97 


0.0 


(120,400) 


5.069 


0.076 


0.0 


0.001 


0.0 


0.0 


0.927 


0.072 


0.0 


0.986 


0.0 


(240,800) 


5.084 


0.077 


0.0 


0.0 


0.0 


0.0 


0.916 


0.084 


0.0 


0.993 


0.0 
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Table 5. (Continued) Mean, mean squared error and empirical distribution of g n , mean and mean squared 
error of a 2 over 1000 independent replications for Model 1 and cr 2 = 1. 





Distribution of q n 


a 2 


(60,100) 5.087 0.236 
(120,200) 5.095 0.092 
(240,400) 5.07 0.065 
(480,800) 5.067 0.063 


0.001 0.009 0.004 0.0 0.865 0.122 0.002 
0.0 0.0 0.001 0.0 0.902 0.097 0.0 
0.0 0.0 0.0 0.0 0.93 0.07 0.0 
0.0 0.0 0.0 0.0 0.933 0.067 0.0 


0.943 0.003 
0.971 0.0 
0.985 0.0 
0.993 0.0 



Table 6. Mean, mean squared error and empirical distribution of q n , mean and mean squared error of a 1 
over 1000 independent replications for Model 2 and cr 2 = 1. 





Distribution of q n 


a 


•i 


(p,n) 


Mean 


MSE 





1 


2 


3 


4 


5 


6 


Mean 


MSE 


(30,100) 


3.362 


2.019 


0.079 


0.078 


0.091 


0.0 


0.658 


0.094 


0.0 


1.052 


0.043 


(60,200) 


3.806 


1.023 


0.032 


0.038 


0.026 


0.0 


0.805 


0.098 


0.001 


0.994 


0.005 


(120,400) 


3.983 


0.483 


0.019 


0.008 


0.004 


0.0 


0.878 


0.091 


0.0 


0.991 


0.001 


(240,800) 


4.071 


0.144 


0.003 


0.001 


0.001 


0.0 


0.907 


0.088 


0.0 


0.994 


0.0 


(60,100) 


3.367 


1.898 


0.069 


0.081 


0.096 


0.001 


0.674 


0.079 


0.0 


1.003 


0.012 


(120,200) 


3.781 


1.04 


0.034 


0.034 


0.036 


0.0 


0.806 


0.089 


0.001 


0.986 


0.002 


(240,400) 


3.965 


0.472 


0.015 


0.015 


0.007 


0.0 


0.892 


0.071 


0.0 


0.99 


0.0 


(480,800) 


4.052 


0.125 


0.002 


0.003 


0.0 


0.0 


0.926 


0.069 


0.0 


0.994 


0.0 




Fig. 2. Frequency of q n = qo over 1000 independent replications with cr 2 = 1. 
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Table 7. Empirical distribution of q n , mean and mean squared error of a 2 over 1000 independent 
replications for Model 1 and a 2 = 500. 





Distribution of q n 


a 2 


ip,n) 


1 


2 


3 


4 


5 


6 


7 


Mean 


MSE 


(30,100) 


0.003 


0.012 


0.005 


0.0 


0.823 


0.155 


0.002 


474.909 


3281.714 


(60,200) 


0.0 


0.001 


0.0 


0.0 


0.904 


0.094 


0.001 


485.019 


99.558 


(120,400) 


0.0 


0.001 


0.0 


0.0 


0.918 


0.080 


0.001 


492.608 


21.244 


(240,800) 


0.0 


0.0 


0.0 


0.0 


0.914 


0.086 


0.0 


496.316 


3.519 


(60,100) 


0.002 


0.008 


0.006 


0.001 


0.870 


0.113 


0.0 


472.816 


688.994 


(120,200) 


0.0 


0.002 


0.0 


0.0 


0.898 


0.099 


0.001 


485.49 


55.489 


(240,400) 


0.0 


0.0 


0.0 


0.0 


0.928 


0.071 


0.001 


492.699 


7.242 


(480,800) 


0.0 


0.0 


0.0 


0.0 


0.933 


0.067 


0.0 


496.377 


1.654 



Table 8. Empirical distribution of q n , mean and mean squared error of a 2 over 1000 independent 
replications for Model 2 and a 2 = 500. 





Distribution of q n 


a 2 


(P,n) 





1 


2 


3 


4 


5 


6 


Mean 


MSE 


(30,100) 


0.079 


0.088 


0.090 


0.0 


0.649 


0.093 


0.001 


528.651 


11223.872 


(60,200) 


0.037 


0.037 


0.029 


0.0 


0.794 


0.103 


0.0 


498.032 


1478.184 


(120,400) 


0.009 


0.01 


0.005 


0.0 


0.880 


0.096 


0.0 


494.613 


107.355 


(240,800) 


0.003 


0.0 


0.002 


0.0 


0.918 


0.075 


0.002 


496.813 


8.770 


(60,100) 


0.071 


0.104 


0.059 


0.001 


0.687 


0.078 





501.754 


3126.083 


(120,200) 


0.036 


0.038 


0.043 


0.0 


0.809 


0.074 


0.0 


493.687 


438.063 


(240,400) 


0.013 


0.007 


0.009 


0.0 


0.900 


0.071 


0.0 


494.445 


39.686 


(480,800) 


0.004 


0.001 


0.0 


0.0 


0.941 


0.054 


0.0 


496.836 


3.576 




Fig. 3. Frequency of q n = qo over 1000 independent replications with a 2 = 500. 



First, we can see the asymptotic consistancy of the estimator of go hi all the 
four cases. If we compare these simulations with the known variance case, we can 
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Fig. 4. Mean of <? 2 over 1000 independent replications. 



see that the estimation is less accurate in the small (p, n). Furthermore, as in the 
previous case, the convergence is slower in the c = 0.6 case and the estimator per- 
forms better in Model 1 than in Model 2, for both values of a 2 . The estimation of 
qo is more accurate with an unknown variance of a 2 = 500. 

We also give the mean and mean squared error of q n in the a 2 — 1 case (Tables 4-5 
and 6) to compare with Tables 1-2 and 3, where a 2 — 1 also, to see the effect of 
its estimation. The variance and the bias are higher especially for small values of 
(p, n) in this case with unknown variance. 

The estimation of a 2 performs well, but it seems to be underestimated. There is 
no particular difference between the two values of c in Model 1 but in Model 2, 
contrary to the estimation of qo, the convergence seems to be faster in the c = 0.6 
case for a 2 . The variance of the estimator decreases with the increase of n and p, 
and is less in the c = 0.6 case. As expected, the variance is lower in the a 2 — 1 case. 
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5. Comparaison with two Related Methods 

In signal processing or econometric literature, the factor model (or linear mixture 
model) is often used. This model is defined as follows: let (xj = x(i,)) (i<j< n ) be an 
i.i.d n-sample of p-dimensional random vectors satisfying 

So 

X(t) = 2J CLkSk(t) + au(t) 

fc=i 

= As(t) +au{t), 

where 

• s(t) = (si(t), . . . , s qo (t))' £ R 90 are qo random factors (or signals) assumed to 
have zero mean, unit variance and mutually uncorrelated; 

• A = (01, . . . , a qo ) is a p x go fixed unknown matrix of rank qo (response vectors 
or factor loadings); 

• a € R is the noise level, u ~ A/"(0, 

It is easy to show that in this case, the population covariance matrix takes the 
form of a spiked population model: the spikes are only slightly modified. If we denote 
by a' the vector of spikes in the factor model, we have the following relationship 
with our original vector a 

a' , 
a = — + 1. 

Here determining the number of spikes qo means the detection of the number of 
factors/signals qo . We will explain and compare two methods from Econometrics 
(Harding [4]) and signal processing (Kritchman & Nadler [6]), respectively. 

5.1. Method of Harding and comparison 

In his paper [4], Harding uses less restrictive hypotheses as the sequence (u(t)) is 
not necessarily independent, but he simulates a Gaussian model. His general idea is 
to compare the spectral moments of S„ with the ESD of S n without the factors (or 
spikes), and to remove the largest eigenvalues one by one in S n until a "distance" 
between the moments is minimum. 

More precisely, the variance of the noise is seen as a parameter 8 and his idea 
is to write S„ = S„ + fi„ (rank(S„) = qo) as a sum of a finite rank perturbation 
S„ of the noise covariance il n - Let n(S„) be the vector of the first s moments of 
the empirical spectral distribution (ESD) of the covariance matrix S n , n(57„) the 
equivalent for J7„ and 11(0) its limit as p and n — ¥ +00, 2 — > c. Here is the procedure 
of Harding: 
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• First, compute the moments H(6) of the asymptotic eigenvalue distribution of 
the covariance matrix of fl n for a large (p, n) sample. 

• By Bai and Silverstein ([12]), we have that p (II(fi n ) - 11(6)) -A 7V(A, W). Con- 
sequently, estimate 9 by: 

9 = argmin e (IL(6) - H(S n )' W~ x (11(6) - IL(S n ) . 

V v ' 

J(0) 

where 14 7 is a consistent estimate of W, calculate by estimated 6 from a first step 
estimation with W = \ p . 

• Next, remove the largest eigenvalue of the spectrum of S n and re-estimate the 
parameter 6 as previously to get a new estimate 6%. 

• This step is repeated by progressively removing large eigenvalues and for prefixed 
number of times to get a sequence of estimates 62, #3, ...etc. 

• Finally, among the minimized objective functions J(6i) choose the order one 
which corresponds to the smallest minimized value: 

<7o = argmin,; J(6i). 

Actually, we know that for q fixed and p, n — > +00, n(S„) — > n((9). So the 
criterion is the minimization of the variance W = W (6): it decreases until qq (until 
we have removed the eigenvalues corresponding to the spikes), then it stays stable. 
The procedure of Harding leads to an underestimation of qo , at p and n fixed. That 
is why he penalized the function J with a function of type kdg(p, n), where k is the 
number of eigenvalues removed, 9 is the estimated variance at the step q and g(p, n) 
is a function such that g(p, n) — > when p, n — > +00. The finally proposed choice 
for g is the following function given by Bai and Ng [3] based on a BIC criterion 

9 (p,n)=( P -±^)ln(j^-). 

\ pn J \p + nj 

For his simulation experiment, he tested four different "distances" but we only 
keep the one based on the BIC criterion which is the best. Furthermore, we don't 
give all cases he tested. The simulation design was a little bit different, indeed 
Harding does not choose the spikes directly, but he generates s(t) as a Gaussian 
law Af(0, \ p ) and A in a deterministic way. We calculate the corresponding spikes 
and it leads to the following values: 



• (p, n) = (30, 100): (a 5 , a 4 , a 3 , a 2 , «i) = 

• (p, n) = (90, 100): (a 5 , a 4 , a 3 , a 2 , «i) = 

• (p,n) = (210,300): (a 5 , a 4 , a 3 , a 2 , oc\) 

• (p, n) — (250,500): (a 5 , a 4 , a 3 , a 2 , ai) 



(3.817, 6.877, 10.038, 16.973, 258.719) 
(3.692, 7.276, 10.785, 18.101, 259.010) 
= (3.649, 7.377, 10.992, 18.418, 259.083) 
= (3.634, 7.448, 11.057, 18.453, 259.005) 



Nonetheless, these cases stay very close. Below we compare his results to ours. We 
only give in Table 9 the mean and mean squared errors of the estimator as reported 
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in Harding's paper. 



Table 9. Compared mean and mean squared error of our q n and a 2 and those of Harding 
over 5000 independent replications and a 2 = 1. 





<?o 


S 2 




Harding 


estimator 


Our estimator 


Harding 


estimator 


Our estimator 


(p,n) 


Mean 


MSE 


Mean 


MSE 


Mean 


MSE 


Mean 


MSE 


(30,100) 


5.028 


0.028 


5.087 


0.266 


0.942 


0.004 


0.946 


0.008 


(90,100) 


5.040 


0.048 


5.049 


0.232 


0.944 


0.001 


0.943 


0.0 


(210,300) 


5.004 


0.004 


5.087 


0.082 


0.982 


0.0 


0.980 


0.0 


(250,500) 


5.002 


0.002 


5.077 


0.072 


0.989 


0.0 


0.988 


0.0 



Both methods perform well and their results are overall very close except that 
Harding's estimation yields a slightly smaller MSE for q . However, one should have 
in mind that this estimation has a very complex construction and a rigorous justifi- 
cation of its different steps is still open. Moreover, the spikes in Table 9 are large and 
well-separated one from another; it remains unclear how this method will perform 
in a case where the spikes are much smaller and close like in Model 2, considered 
in Sections 3 and 4. By contrast, our estimator has a very simple construction and 
we proved its consistency under reasonable assumptions. 

5.2. Method of Kritchman & Nadler and comparison 

These authors assume the Gaussian case. In the absence of spikes, nS n follows a 
Wishart distribution with parameters n,p. In this case, Johnstone |13j gave the 
asymptotic distribution of the largest eigenvalue of S n . 

Proposition 5.1. LetS n be the sample covariance matrix of n vectors distributed 
as A/"(0, <7 2 l p ) ; and A nj i > A rlj 2 > ■ ■ • > A„ iP be its eigenvalues. Then, when n — > + 
co, such that — > c > 

*(^<^-+*)->m->o 

where b = (l + ^/c) 2 , /3 n>p = (l + y/1[) (l + 3 and Fi is the i-th Tracy-Widom 
distribution. 

Assuming the variance a 2 is known. To distinguish a spike eigenvalue A from a 
non-spike one at an asymptotic significance level 7, their idea is to check whether 

A„, fc >a 2 (^ S (7) + 6) (5.1) 

where the value of 5(7) can be found by inverting the Tracy-Widom distribu- 
tion. This distribution has no explicit expression, but can be computed from a 
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solution of a second order Painleve ordinary differential equation. Their estima- 
tor is based on a sequence of nested hypothesis tests of the following form: for 
k = 1,2,..., min(p, n) — 1, 

Hq: qo > k vs. Hi: qo < k — 1 . 

For each value of k, they test the likelihood of the fc-th eigenvalue A n ,fe as arising 
from a signal or from noise as (|5.1I) . If (|5.1I) is satisfied, 'Ho is accepted and k is 
increased by one. The procedure stops once an instance of Hq is rejected and the 
number of spikes is estimated to be q n> 2 = k—1. Formally, their estimator is defined 

by 

q n ,2 = argmin fe ^X,^ k < a 2 { ^^^ s {l) + ~ L 

When (j is unknown, they estimate it by the same method we used. For their 
simulations, they use four different settings, with a 2 = 1 

• Al: a' = (200, 50), c = 4 (i.e. a = (201, 51)); 

• A2: a' = (200,50), c=l; 

• Bl: a' = (200, 50, 10, 5), c = 4 (i.e. a = (201, 51, 11,6)); 

• B2: a' = (200, 50, 10, 5), c = 1; 

with p = 64 and p = 1024. Notice that contrary to ours and those of Harding, in 
their simulation, c > 1 and the difference between two consecutive spikes is higher. 
We add two settings with different variance 

• A2': a' = (200, 50), c = 1, a 2 = 20 (i.e. a = (11, 3.5)); 

• B2': a' = (200, 50, 10, 5), c = 1, a 2 = 2 (i.e. a = (101, 26, 6, 3.5)); 

and p = 64. The results are displayed in tables 10 and 11. 



Table 10. Summary for p = 64 showing the frequency of 
<?0 = <?0- 





Setting 


Our estimator 


Estimator KN 


Al 


(p, n) = (64, 16) 


0.943 


0.994 


A2 


(p, n) = (64, 64) 


0.966 


0.993 


A2' 


(p, n) = (64, 64) 


0.602 


0.513 


Bl 


(p,n) = (64, 16) 


0.348 


0.238 


B2 


(p, n) = (64, 64) 


0.947 


0.995 


B2' 


(p, n) = (64, 64) 


0.734 


0.682 



With small p and n, both estimator performs well, except for the A2', Bl, and 
B2' cases where the spikes are closer to 1 + y/c than in the other cases. 
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Table 11. Summary for p = 1024 showing the frequency of go = 

go- 



Setting 


Our estimator 


Estimator KN 


Al; (p,n) = (1024, 256) 


0.995 


0.994 


A2; (p,n) = (1024, 1024) 


0.986 


0.993 


Bl; (p,n) = (1024,256) 


0.999 


0.999 


B2; (p,n) = (1024, 1024) 


0.986 


0.994 



With larger p and u, the results from both methods are comparable. Neverthe- 
less, theoritical properties remain unclear for the KN estimator: it is proved that 

lim P(g„. 2 > g ) = 1, 

p,n— »+oo 

and, in the one factor case (go = 1) that 

lim P (q n 2 > g ) = 7- 

That is by construction, the proposed estimator cannot be fully consistent but 
nearly consistent with an incompressible asymptotic error of 7. Actually the authors 
are using a very small test level 7 = 0.005 in their experiments. Whether this 
property remains true for general case with more than one spike stays open and 
even so, this near-consistency is a bit unsatisfactory from a theoretical point a view. 

6. Case of Spikes with Multiplicity greater than one 

The problem with two identical spikes is that the difference between the corres- 
ponding eigenvalues of the sample covariance matrix will tend to zero. Neverthe- 
less, our method still works: we can explain it by the fact that the convergence 
of the \ n ,i-, f° r * > Qo (non-spikes) is in Op (-373), whereas that of the difference 
corresponding of two identical spikes is in Op (^7n) (Consequence of theorem 3.1 
of Bai & Yao [7]). Furthermore, the variance in the convergence of this difference 
is 2a 2 I 1 — , 3n3 ) ~ 2a 2 , which is quite high for high spikes. A complete justi- 
fication of our method in this case with multiple spikes is still under investigation. 
Here we provide some simulation results in order to have a first idea about its per- 
formance. 

We will only consider the known variance case. If it is not the case, the procedure 
explained before will apply without any problem. Here are the results with the 
same simulation design as previously, except that we introduce multiple spikes. We 
consider two models: 

• Model 3: g = 6, (ai, a 2 , a 3 , a 4 , a 5 , a 6 ) = (259.7,259.7,18,11.1,7.9,4.8); 

• Model 4: go = 6, (ax, 013, a.4, a$, ae) = (7, 6, 6, 6, 5, 4). 

For each model, two different values of c, 0.3 and 0.6, are considered, and we 
give in Figure 5 the frequency of q n = go and in Table 12 the mean and the mean 
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squared error of our estimator over 1000 independent replications. 



Table 12. Mean and mean squared error of Q n OVGr 
1000 independent replications for Model 1 and 2. 





Model 3, <jo = 6 


Model 4, go = 6 


(p,n) 


Mean 


MSE 


Mean 


MSE 


(30,100) 


6.085 


0.168 


4.529 


4.393 


(60,200) 


6.077 


0.121 


4.86 


4.199 


(120,400) 


6.088 


0.082 


5.31 


3.061 


(240,800) 


6.073 


0.068 


5.597 


2.051 


(60,100) 


6.043 


0.151 


4.118 


4.797 


(120,200) 


6.092 


0.108 


4.614 


4.453 


(240,400) 


6.081 


0.074 


5.159 


3.447 


(480,800) 


6.079 


0.073 


5.562 


2.058 



Model 3 Model 4 




100 300 500 700 ^ 100 300 500 700 



n n 

Fig. 5. Frequency of q n = go over 1000 independent replications. 

In both cases, we can observe the asymptotic consistency of the estimator, but 
the convergence is slower in Model 4: indeed, the eigenvalue spacings are smaller. 
Furthermore, the values of the spikes are small, so that the variance in the conver- 
gence of the spikes is not very high and the fluctuations of the difference are smaller 
than in Model 3. 

7. Extension to the generalized spiked population model 

In |14) . the author define the generalized spiked population model: the covariancc 
matrix is extended to a general T from I. Once we have corresponding Tracy- 
Widom limits for sample eigenvalues converging to the edges of support intervals, 
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our approach can be readily adapted to this situation. However such results are 
lacking. 
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